Toponym recognition in custom-made map titles
نویسندگان
چکیده
The titles of customized topographic maps constitute a specific corpus which is characterized by a very significant number of place names and spelling variations. This paper is about identifying toponyms in these titles. The toponym tracking is based on gazetteers as well as light parsing according to patterns. The method used broadens the definition of the toponym to include the nature of the corpus and the data in it. It consists of seven successive stages where both the extralinguistic context in this case toponym georeferencing and the linguistic context are taken into account. Mistakes in tagging are analyzed from the corpus characteristics and the results of each step tagging are evaluated (recall, precision, F-measure). Different conclusions can be suggested: i) toponym recognition in web corpora should take into account spelling changes, ii) toponym recognition cannot be limited to gazetteer proper nouns, iii) the notion of subjective toponym is relevant in this specific corpus, and could be considered with reference to the customization of maps.
منابع مشابه
Combining Sources of Evidence to Resolve Ambiguities in Toponym Recognition in Cartographic Maps
Graphical documents such as cartographic maps contain a great variety of textual elements appearing in different spatial positions, in different fonts, sizes, and colors, touching and overlapping graphical symbols. This greatly complicates automatic optical recognition of such textual elements in the process of raster-to-vector conversion of graphical documents. In this work, we propose a metho...
متن کاملClassification of Study Region in Environmental Science Abstracts
One of the potentially most relevant pieces of metadata for filtering studies in environmental science is the geographic region in which the study took place (the “study region”). In this paper, we apply support vector machines to the automatic classification of study region in a dataset of titles and abstracts from environmental science literature, using features including frequency distributi...
متن کاملScientific Certificate of Applied Ethics
Applied ethics is a major having a significant role in attracting the attention of the intellectuals and even the public readers regarding the importance of ethical discussions including discussions of the ethical philosophy. However, since only four decades have been passed from the contemporary approach of the scholars of ethics in this field, it is still considered to be a new and evolving d...
متن کاملError Detection and Correction in Toponym Recognition in Cartographic Maps
At present a lot of methods and programs for automatic text recognition exist. However there are no effective text recognition systems for graphic documents. Graphic documents usually contain a great variety of textual information. As a rule the text appears in arbitrary spatial positions, in different fonts, sizes and colors. The text can touch and overlap graphic symbols. The text meaning is ...
متن کاملA Method Using Esda to Analyze the Spatial Distribution Patterns of Cultural Resource
The spatial distribution pattern of cultural resource generally manifests clustered qualification to some extent, and we can infer the vicissitudes of the correlative culture by analyzing the changing rule of the spatial distribution pattern of cultural resource. However, the majority of the concerned researches are still at the level of qualitative statistics and thematic map visualization of ...
متن کامل